187 research outputs found

    Introduction to Protein Structure Prediction

    Get PDF
    This chapter gives a graceful introduction to problem of protein three- dimensional structure prediction, and focuses on how to make structural sense out of a single input sequence with unknown structure, the 'query' or 'target' sequence. We give an overview of the different classes of modelling techniques, notably template-based and template free. We also discuss the way in which structural predictions are validated within the global com- munity, and elaborate on the extent to which predicted structures may be trusted and used in practice. Finally we discuss whether the concept of a sin- gle fold pertaining to a protein structure is sustainable given recent insights. In short, we conclude that the general protein three-dimensional structure prediction problem remains unsolved, especially if we desire quantitative predictions. However, if a homologous structural template is available in the PDB model or reasonable to high accuracy may be generated

    PhyloPars: estimation of missing parameter values using phylogeny

    Get PDF
    A wealth of information on metabolic parameters of a species can be inferred from observations on species that are phylogenetically related. Phylogeny-based information can complement direct empirical evidence, and is particularly valuable if experiments on the species of interest are not feasible. The PhyloPars web server provides a statistically consistent method that combines an incomplete set of empirical observations with the species phylogeny to produce a complete set of parameter estimates for all species. It builds upon a state-of-the-art evolutionary model, extended with the ability to handle missing data. The resulting approach makes optimal use of all available information to produce estimates that can be an order of magnitude more accurate than ad-hoc alternatives. Uploading a phylogeny and incomplete feature matrix suffices to obtain estimates of all missing values, along with a measure of certainty. Real-time cross-validation provides further insight in the accuracy and bias expected for estimated values. The server allows for easy, efficient estimation of metabolic parameters, which can benefit a wide range of fields including systems biology and ecology. PhyloPars is available at: http://www.ibi.vu.nl/programs/phylopars/

    Scooby-domain: prediction of globular domains in protein sequence

    Get PDF
    Scooby-domain (sequence hydrophobicity predicts domains) is a fast and simple method to identify globular domains in protein sequence, based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method successfully identifies sequence regions that will form a globular structure and those that are likely to be unstructured. The method does not rely on homology searches and, therefore, can identify previously unknown domains for structural elucidation. Scooby-domain is available as a Java applet at . It may be used to visualize local properties within a protein sequence, such as average hydrophobicity, secondary structure propensity and domain boundaries, as well as being a method for fast domain assignment of large sequence sets

    Strategies for protein structure model generation

    Get PDF
    This chapter deals with approaches for protein three-dimensional structure prediction, starting out from a single input sequence with unknown struc- ture, the 'query' or 'target' sequence. Both template based and template free modelling techniques are treated, and how resulting structural models may be selected and refined. We give a concrete flowchart for how to de- cide which modelling strategy is best suited in particular circumstances, and which steps need to be taken in each strategy. Notably, the ability to locate a suitable structural template by homology or fold recognition is crucial; without this models will be of low quality at best. With a template avail- able, the quality of the query-template alignment crucially determines the model quality. We also discuss how other, courser, experimental data may be incorporated in the modelling process to alleviate the problem of missing template structures. Finally, we discuss measures to predict the quality of models generated

    The meaning of alignment: lessons from structural diversity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structural alignment provides a fundamental basis for deriving principles of functional and evolutionary relationships. It is routinely used for structural classification and functional characterization of proteins and for the construction of sequence alignment benchmarks. However, the available techniques do not fully consider the implications of protein structural diversity and typically generate a single alignment between sequences.</p> <p>Results</p> <p>We have taken alternative protein crystal structures and generated simulation snapshots to explicitly investigate the impact of structural changes on the alignments. We show that structural diversity has a significant effect on structural alignment. Moreover, we observe alignment inconsistencies even for modest spatial divergence, implying that the biological interpretation of alignments is less straightforward than commonly assumed. A salient example is the GroES 'mobile loop' where sub-Ångstrom variations give rise to contradictory sequence alignments.</p> <p>Conclusion</p> <p>A comprehensive treatment of ambiguous alignment regions is crucial for further development of structural alignment applications and for the representation of alignments in general. For this purpose we have developed an on-line database containing our data and new ways of visualizing alignment inconsistencies, which can be found at <url>http://www.ibi.vu.nl/databases/stralivari</url>.</p

    Homology-extended sequence alignment

    Get PDF
    We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading

    FluxSimulator: An R Package to Simulate Isotopomer Distributions in Metabolic Networks

    Get PDF
    The representation of biochemical knowledge in terms of fluxes (transformation rates) in a metabolic network is often a crucial step in the development of new drugs and efficient bioreactors. Mass spectroscopy (MS) and nuclear magnetic resonance spectroscopy (NMRS) in combination with ^13C labeled substrates are experimental techniques resulting in data that may be used to quantify fluxes in the metabolic network underlying a process. The massive amount of data generated by spectroscopic experiments increasingly requires software which models the dynamics of the underlying biological system. In this work we present an approach to handle isotopomer distributions in metabolic networks using an object-oriented programming approach, implemented using S4 classes in R. The developed package is called FluxSimulator and provides a user friendly interface to specify the topological information of the metabolic network as well as carbon atom transitions in plain text files. The package automatically derives the mathematical representation of the formulated network, and assembles a set of ordinary differential equations (ODEs) describing the change of each isotopomer pool over time. These ODEs are subsequently solved numerically. In a case study FluxSimulator was applied to an example network. Our results indicate that the package is able to reproduce exact changes in isotopomer compositions of the metabolite pools over time at given flux rates.

    Natalie 2.0: Sparse Global Network Alignment as a Special Case of Quadratic Assignment

    Get PDF
    International audienceData on molecular interactions is increasing at a tremendous pace, while the development of solid methods for analyzing this network data is still lagging behind. This holds in particular for the field of comparative network analysis, where one wants to identify commonalities between biological networks. Since biological functionality primarily operates at the network level, there is a clear need for topology-aware comparison methods. We present a method for global network alignment that is fast and robust and can flexibly deal with various scoring schemes taking both node-to-node correspondences as well as network topologies into account. We exploit that network alignment is a special case of the well-studied quadratic assignment problem (QAP). We focus on sparse network alignment, where each node can be mapped only to a typically small subset of nodes in the other network. This corresponds to a QAP instance with a symmetric and sparse weight matrix. We obtain strong upper and lower bounds for the problem by improving a Lagrangian relaxation approach and introduce the open source software tool Natalie 2.0, a publicly available implementation of our method. In an extensive computational study on protein interaction networks for six different species, we find that our new method outperforms alternative established and recent state-of-the-art methods

    CGHnormaliter: an iterative strategy to enhance normalization of array CGH data with imbalanced aberrations

    Get PDF
    Background: Array comparative genomic hybridization (aCGH) is a popular technique for detection of genomic copy number imbalances. These play a critical role in the onset of various types of cancer. In the analysis of aCGH data, normalization is deemed a critical pre-processing step. In general, aCGH normalization approaches are similar to those used for gene expression data, albeit both data-types differ inherently. A particular problem with aCGH data is that imbalanced copy numbers lead to improper normalization using conventional methods. Results: In this study we present a novel method, called CGHnormaliter, which addresses this issue by means of an iterative normalization procedure. First, provisory balanced copy numbers are identified and subsequently used for normalization. These two steps are then iterated to refine the normalization. We tested our method on three well-studied tumor-related aCGH datasets with experimentally confirmed copy numbers. Results were compared to a conventional normalization approach and two more recent state-of-the-art aCGH normalization strategies. Our findings show that, compared to these three methods, CGHnormaliter yields a higher specificity and precision in terms of identifying the 'true' copy numbers. Conclusion: We demonstrate that the normalization of aCGH data can be significantly enhanced using an iterative procedure that effectively eliminates the effect of imbalanced copy numbers. This also leads to a more reliable assessment of aberrations. An R-package containing the implementation of CGHnormaliter is available at http://www.ibi.vu.nl/programs/cghnormaliterwww. © 2009 van Houte et al; licensee BioMed Central Ltd
    corecore